Overview

Dataset statistics

Number of variables23
Number of observations1296675
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory269.7 MiB
Average record size in memory218.1 B

Variable types

Numeric10
Categorical13

Alerts

trans_date_trans_time has a high cardinality: 1274791 distinct valuesHigh cardinality
merchant has a high cardinality: 693 distinct valuesHigh cardinality
first has a high cardinality: 352 distinct valuesHigh cardinality
last has a high cardinality: 481 distinct valuesHigh cardinality
street has a high cardinality: 983 distinct valuesHigh cardinality
city has a high cardinality: 894 distinct valuesHigh cardinality
state has a high cardinality: 51 distinct valuesHigh cardinality
job has a high cardinality: 494 distinct valuesHigh cardinality
dob has a high cardinality: 968 distinct valuesHigh cardinality
trans_num has a high cardinality: 1296675 distinct valuesHigh cardinality
Unnamed: 0 is highly overall correlated with unix_timeHigh correlation
zip is highly overall correlated with long and 2 other fieldsHigh correlation
lat is highly overall correlated with merch_lat and 1 other fieldsHigh correlation
long is highly overall correlated with zip and 2 other fieldsHigh correlation
unix_time is highly overall correlated with Unnamed: 0High correlation
merch_lat is highly overall correlated with lat and 1 other fieldsHigh correlation
merch_long is highly overall correlated with zip and 2 other fieldsHigh correlation
state is highly overall correlated with zip and 4 other fieldsHigh correlation
is_fraud is highly imbalanced (94.9%)Imbalance
amt is highly skewed (γ1 = 42.27787379)Skewed
Unnamed: 0 is uniformly distributedUniform
trans_date_trans_time is uniformly distributedUniform
trans_num is uniformly distributedUniform
Unnamed: 0 has unique valuesUnique
trans_num has unique valuesUnique

Reproduction

Analysis started2022-12-29 04:32:54.573887
Analysis finished2022-12-29 04:37:55.884805
Duration5 minutes and 1.31 second
Software versionpandas-profiling vv3.6.1
Download configurationconfig.json

Variables

Unnamed: 0
Real number (ℝ)

HIGH CORRELATION  UNIFORM  UNIQUE 

Distinct1296675
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean648337
Minimum0
Maximum1296674
Zeros1
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size19.8 MiB
2022-12-29T10:07:56.181017image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile64833.7
Q1324168.5
median648337
Q3972505.5
95-th percentile1231840.3
Maximum1296674
Range1296674
Interquartile range (IQR)648337

Descriptive statistics

Standard deviation374317.97
Coefficient of variation (CV)0.57735094
Kurtosis-1.2
Mean648337
Median Absolute Deviation (MAD)324169
Skewness-5.1691189 × 10-15
Sum8.4068238 × 1011
Variance1.4011395 × 1011
MonotonicityStrictly increasing
2022-12-29T10:07:56.368011image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 1
 
< 0.1%
864447 1
 
< 0.1%
864454 1
 
< 0.1%
864453 1
 
< 0.1%
864452 1
 
< 0.1%
864451 1
 
< 0.1%
864450 1
 
< 0.1%
864449 1
 
< 0.1%
864448 1
 
< 0.1%
864446 1
 
< 0.1%
Other values (1296665) 1296665
> 99.9%
ValueCountFrequency (%)
0 1
< 0.1%
1 1
< 0.1%
2 1
< 0.1%
3 1
< 0.1%
4 1
< 0.1%
5 1
< 0.1%
6 1
< 0.1%
7 1
< 0.1%
8 1
< 0.1%
9 1
< 0.1%
ValueCountFrequency (%)
1296674 1
< 0.1%
1296673 1
< 0.1%
1296672 1
< 0.1%
1296671 1
< 0.1%
1296670 1
< 0.1%
1296669 1
< 0.1%
1296668 1
< 0.1%
1296667 1
< 0.1%
1296666 1
< 0.1%
1296665 1
< 0.1%

trans_date_trans_time
Categorical

HIGH CARDINALITY  UNIFORM 

Distinct1274791
Distinct (%)98.3%
Missing0
Missing (%)0.0%
Memory size19.8 MiB
2019-04-22 16:02:01
 
4
2020-06-01 01:37:47
 
4
2020-06-02 12:47:07
 
4
2019-11-18 23:03:49
 
3
2019-12-01 14:11:58
 
3
Other values (1274786)
1296657 

Length

Max length19
Median length19
Mean length19
Min length19

Characters and Unicode

Total characters24636825
Distinct characters13
Distinct categories4 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1253218 ?
Unique (%)96.6%

Sample

1st row2019-01-01 00:00:18
2nd row2019-01-01 00:00:44
3rd row2019-01-01 00:00:51
4th row2019-01-01 00:01:16
5th row2019-01-01 00:03:06

Common Values

ValueCountFrequency (%)
2019-04-22 16:02:01 4
 
< 0.1%
2020-06-01 01:37:47 4
 
< 0.1%
2020-06-02 12:47:07 4
 
< 0.1%
2019-11-18 23:03:49 3
 
< 0.1%
2019-12-01 14:11:58 3
 
< 0.1%
2019-12-09 17:30:34 3
 
< 0.1%
2019-12-29 21:58:02 3
 
< 0.1%
2020-06-14 07:16:47 3
 
< 0.1%
2019-09-01 19:41:45 3
 
< 0.1%
2019-07-08 18:30:52 3
 
< 0.1%
Other values (1274781) 1296642
> 99.9%

Length

2022-12-29T10:07:56.628035image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2019-12-08 6428
 
0.2%
2019-12-15 6425
 
0.2%
2019-12-22 6325
 
0.2%
2019-12-29 6320
 
0.2%
2019-12-01 6283
 
0.2%
2019-12-09 6252
 
0.2%
2019-12-02 6150
 
0.2%
2019-12-16 6127
 
0.2%
2019-12-30 6064
 
0.2%
2019-12-23 5937
 
0.2%
Other values (86927) 2531039
97.6%

Most occurring characters

ValueCountFrequency (%)
0 4537200
18.4%
2 3577846
14.5%
1 3411517
13.8%
- 2593350
10.5%
: 2593350
10.5%
9 1488122
 
6.0%
1296675
 
5.3%
3 1201556
 
4.9%
5 1073153
 
4.4%
4 1060414
 
4.3%
Other values (3) 1803642
 
7.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 18153450
73.7%
Dash Punctuation 2593350
 
10.5%
Other Punctuation 2593350
 
10.5%
Space Separator 1296675
 
5.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 4537200
25.0%
2 3577846
19.7%
1 3411517
18.8%
9 1488122
 
8.2%
3 1201556
 
6.6%
5 1073153
 
5.9%
4 1060414
 
5.8%
6 637897
 
3.5%
8 585293
 
3.2%
7 580452
 
3.2%
Dash Punctuation
ValueCountFrequency (%)
- 2593350
100.0%
Other Punctuation
ValueCountFrequency (%)
: 2593350
100.0%
Space Separator
ValueCountFrequency (%)
1296675
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 24636825
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 4537200
18.4%
2 3577846
14.5%
1 3411517
13.8%
- 2593350
10.5%
: 2593350
10.5%
9 1488122
 
6.0%
1296675
 
5.3%
3 1201556
 
4.9%
5 1073153
 
4.4%
4 1060414
 
4.3%
Other values (3) 1803642
 
7.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 24636825
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 4537200
18.4%
2 3577846
14.5%
1 3411517
13.8%
- 2593350
10.5%
: 2593350
10.5%
9 1488122
 
6.0%
1296675
 
5.3%
3 1201556
 
4.9%
5 1073153
 
4.4%
4 1060414
 
4.3%
Other values (3) 1803642
 
7.3%

cc_num
Real number (ℝ)

Distinct983
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.1719204 × 1017
Minimum6.0416207 × 1010
Maximum4.9923464 × 1018
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size19.8 MiB
2022-12-29T10:07:56.780076image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum6.0416207 × 1010
5-th percentile6.3048488 × 1011
Q11.8004295 × 1014
median3.5214173 × 1015
Q34.6422555 × 1015
95-th percentile4.497914 × 1018
Maximum4.9923464 × 1018
Range4.9923463 × 1018
Interquartile range (IQR)4.4622125 × 1015

Descriptive statistics

Standard deviation1.3088064 × 1018
Coefficient of variation (CV)3.1371798
Kurtosis6.1799499
Mean4.1719204 × 1017
Median Absolute Deviation (MAD)3.0764709 × 1015
Skewness2.851879
Sum-6.7255419 × 1018
Variance1.7129743 × 1036
MonotonicityNot monotonic
2022-12-29T10:07:56.982036image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
5.713652351 × 10113123
 
0.2%
4.512828415 × 10183123
 
0.2%
3.672269902 × 10133119
 
0.2%
2.131124026 × 10143117
 
0.2%
3.54510934 × 10153113
 
0.2%
6.534628261 × 10153112
 
0.2%
6.011367958 × 10153110
 
0.2%
2.720433096 × 10153107
 
0.2%
6.011438889 × 10153106
 
0.2%
6.011109737 × 10153101
 
0.2%
Other values (973) 1265544
97.6%
ValueCountFrequency (%)
6.041620718 × 10101518
0.1%
6.042292873 × 10101531
0.1%
6.042309813 × 1010510
 
< 0.1%
6.042785159 × 1010528
 
< 0.1%
6.048700208 × 1010496
 
< 0.1%
6.04905963 × 10101010
0.1%
6.049559311 × 1010518
 
< 0.1%
5.018029536 × 10111559
0.1%
5.018181333 × 10118
 
< 0.1%
5.018282048 × 1011515
 
< 0.1%
ValueCountFrequency (%)
4.992346398 × 10182059
0.2%
4.989847571 × 10181007
 
0.1%
4.980323468 × 1018532
 
< 0.1%
4.973530368 × 10181040
0.1%
4.958589672 × 10181476
0.1%
4.95682899 × 10182566
0.2%
4.911818931 × 10189
 
< 0.1%
4.906628656 × 10182584
0.2%
4.897067971 × 10181038
0.1%
4.890424427 × 10181496
0.1%

merchant
Categorical

Distinct693
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size19.8 MiB
fraud_Kilback LLC
 
4403
fraud_Cormier LLC
 
3649
fraud_Schumm PLC
 
3634
fraud_Kuhn LLC
 
3510
fraud_Boyer PLC
 
3493
Other values (688)
1277986 

Length

Max length43
Median length36
Mean length23.132597
Min length13

Characters and Unicode

Total characters29995460
Distinct characters55
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowfraud_Rippin, Kub and Mann
2nd rowfraud_Heller, Gutmann and Zieme
3rd rowfraud_Lind-Buckridge
4th rowfraud_Kutch, Hermiston and Farrell
5th rowfraud_Keeling-Crist

Common Values

ValueCountFrequency (%)
fraud_Kilback LLC 4403
 
0.3%
fraud_Cormier LLC 3649
 
0.3%
fraud_Schumm PLC 3634
 
0.3%
fraud_Kuhn LLC 3510
 
0.3%
fraud_Boyer PLC 3493
 
0.3%
fraud_Dickinson Ltd 3434
 
0.3%
fraud_Cummerata-Jones 2736
 
0.2%
fraud_Kutch LLC 2734
 
0.2%
fraud_Olson, Becker and Koch 2723
 
0.2%
fraud_Stroman, Hudson and Erdman 2721
 
0.2%
Other values (683) 1263638
97.5%

Length

2022-12-29T10:07:57.222418image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
and 474111
 
15.7%
llc 97780
 
3.2%
inc 91939
 
3.0%
sons 73145
 
2.4%
ltd 70853
 
2.3%
plc 66475
 
2.2%
group 50447
 
1.7%
fraud_kutch 10560
 
0.3%
fraud_schaefer 9394
 
0.3%
fraud_streich 9250
 
0.3%
Other values (804) 2069403
68.4%

Most occurring characters

ValueCountFrequency (%)
a 2910697
 
9.7%
r 2695758
 
9.0%
d 2139780
 
7.1%
e 1865710
 
6.2%
u 1857912
 
6.2%
n 1768848
 
5.9%
1726682
 
5.8%
f 1397378
 
4.7%
_ 1296675
 
4.3%
o 1129340
 
3.8%
Other values (45) 11206680
37.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 22698472
75.7%
Uppercase Letter 3398527
 
11.3%
Space Separator 1726682
 
5.8%
Connector Punctuation 1296675
 
4.3%
Dash Punctuation 445070
 
1.5%
Other Punctuation 430034
 
1.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 2910697
12.8%
r 2695758
11.9%
d 2139780
9.4%
e 1865710
 
8.2%
u 1857912
 
8.2%
n 1768848
 
7.8%
f 1397378
 
6.2%
o 1129340
 
5.0%
i 1080395
 
4.8%
t 873637
 
3.8%
Other values (15) 4979017
21.9%
Uppercase Letter
ValueCountFrequency (%)
L 477174
14.0%
C 312176
 
9.2%
S 301639
 
8.9%
B 278515
 
8.2%
H 260640
 
7.7%
K 216627
 
6.4%
G 192442
 
5.7%
R 181447
 
5.3%
M 179139
 
5.3%
P 159738
 
4.7%
Other values (15) 838990
24.7%
Other Punctuation
ValueCountFrequency (%)
, 400966
93.2%
' 29068
 
6.8%
Space Separator
ValueCountFrequency (%)
1726682
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 1296675
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 445070
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 26096999
87.0%
Common 3898461
 
13.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 2910697
 
11.2%
r 2695758
 
10.3%
d 2139780
 
8.2%
e 1865710
 
7.1%
u 1857912
 
7.1%
n 1768848
 
6.8%
f 1397378
 
5.4%
o 1129340
 
4.3%
i 1080395
 
4.1%
t 873637
 
3.3%
Other values (40) 8377544
32.1%
Common
ValueCountFrequency (%)
1726682
44.3%
_ 1296675
33.3%
- 445070
 
11.4%
, 400966
 
10.3%
' 29068
 
0.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 29995460
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 2910697
 
9.7%
r 2695758
 
9.0%
d 2139780
 
7.1%
e 1865710
 
6.2%
u 1857912
 
6.2%
n 1768848
 
5.9%
1726682
 
5.8%
f 1397378
 
4.7%
_ 1296675
 
4.3%
o 1129340
 
3.8%
Other values (45) 11206680
37.4%

category
Categorical

Distinct14
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size19.8 MiB
gas_transport
131659 
grocery_pos
123638 
home
123115 
shopping_pos
116672 
kids_pets
113035 
Other values (9)
688556 

Length

Max length14
Median length12
Mean length10.526079
Min length4

Characters and Unicode

Total characters13648903
Distinct characters20
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowmisc_net
2nd rowgrocery_pos
3rd rowentertainment
4th rowgas_transport
5th rowmisc_pos

Common Values

ValueCountFrequency (%)
gas_transport 131659
10.2%
grocery_pos 123638
9.5%
home 123115
9.5%
shopping_pos 116672
9.0%
kids_pets 113035
8.7%
shopping_net 97543
7.5%
entertainment 94014
7.3%
food_dining 91461
 
7.1%
personal_care 90758
 
7.0%
health_fitness 85879
 
6.6%
Other values (4) 228901
17.7%

Length

2022-12-29T10:07:57.396528image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
gas_transport 131659
10.2%
grocery_pos 123638
9.5%
home 123115
9.5%
shopping_pos 116672
9.0%
kids_pets 113035
8.7%
shopping_net 97543
7.5%
entertainment 94014
7.3%
food_dining 91461
 
7.1%
personal_care 90758
 
7.0%
health_fitness 85879
 
6.6%
Other values (4) 228901
17.7%

Most occurring characters

ValueCountFrequency (%)
s 1429026
10.5%
e 1287345
9.4%
o 1231724
9.0%
n 1193757
8.7%
p 1083847
 
7.9%
t 1076942
 
7.9%
_ 1039039
 
7.6%
r 917535
 
6.7%
i 833007
 
6.1%
a 665234
 
4.9%
Other values (10) 2891447
21.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 12609864
92.4%
Connector Punctuation 1039039
 
7.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
s 1429026
11.3%
e 1287345
10.2%
o 1231724
9.8%
n 1193757
9.5%
p 1083847
8.6%
t 1076942
8.5%
r 917535
7.3%
i 833007
 
6.6%
a 665234
 
5.3%
g 606425
 
4.8%
Other values (9) 2285022
18.1%
Connector Punctuation
ValueCountFrequency (%)
_ 1039039
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 12609864
92.4%
Common 1039039
 
7.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
s 1429026
11.3%
e 1287345
10.2%
o 1231724
9.8%
n 1193757
9.5%
p 1083847
8.6%
t 1076942
8.5%
r 917535
7.3%
i 833007
 
6.6%
a 665234
 
5.3%
g 606425
 
4.8%
Other values (9) 2285022
18.1%
Common
ValueCountFrequency (%)
_ 1039039
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 13648903
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
s 1429026
10.5%
e 1287345
9.4%
o 1231724
9.0%
n 1193757
8.7%
p 1083847
 
7.9%
t 1076942
 
7.9%
_ 1039039
 
7.6%
r 917535
 
6.7%
i 833007
 
6.1%
a 665234
 
4.9%
Other values (10) 2891447
21.2%

amt
Real number (ℝ)

Distinct52928
Distinct (%)4.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean70.351035
Minimum1
Maximum28948.9
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size19.8 MiB
2022-12-29T10:07:57.553540image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2.44
Q19.65
median47.52
Q383.14
95-th percentile196.31
Maximum28948.9
Range28947.9
Interquartile range (IQR)73.49

Descriptive statistics

Standard deviation160.31604
Coefficient of variation (CV)2.2788014
Kurtosis4545.645
Mean70.351035
Median Absolute Deviation (MAD)37.5
Skewness42.277874
Sum91222429
Variance25701.232
MonotonicityNot monotonic
2022-12-29T10:07:57.803530image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1.14 542
 
< 0.1%
1.04 538
 
< 0.1%
1.25 535
 
< 0.1%
1.02 533
 
< 0.1%
1.01 523
 
< 0.1%
1.05 519
 
< 0.1%
1.2 516
 
< 0.1%
1.23 515
 
< 0.1%
1.08 512
 
< 0.1%
1.11 509
 
< 0.1%
Other values (52918) 1291433
99.6%
ValueCountFrequency (%)
1 222
< 0.1%
1.01 523
< 0.1%
1.02 533
< 0.1%
1.03 499
< 0.1%
1.04 538
< 0.1%
1.05 519
< 0.1%
1.06 471
< 0.1%
1.07 498
< 0.1%
1.08 512
< 0.1%
1.09 496
< 0.1%
ValueCountFrequency (%)
28948.9 1
< 0.1%
27390.12 1
< 0.1%
27119.77 1
< 0.1%
26544.12 1
< 0.1%
25086.94 1
< 0.1%
17897.24 1
< 0.1%
15305.95 1
< 0.1%
15047.03 1
< 0.1%
15034.18 1
< 0.1%
14849.74 1
< 0.1%

first
Categorical

Distinct352
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size19.8 MiB
Christopher
 
26669
Robert
 
21667
Jessica
 
20581
James
 
20039
Michael
 
20009
Other values (347)
1187710 

Length

Max length11
Median length9
Mean length6.0804319
Min length3

Characters and Unicode

Total characters7884344
Distinct characters49
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowJennifer
2nd rowStephanie
3rd rowEdward
4th rowJeremy
5th rowTyler

Common Values

ValueCountFrequency (%)
Christopher 26669
 
2.1%
Robert 21667
 
1.7%
Jessica 20581
 
1.6%
James 20039
 
1.5%
Michael 20009
 
1.5%
David 19965
 
1.5%
Jennifer 16940
 
1.3%
William 16371
 
1.3%
Mary 16346
 
1.3%
John 16325
 
1.3%
Other values (342) 1101763
85.0%

Length

2022-12-29T10:07:57.992563image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
christopher 26669
 
2.1%
robert 21667
 
1.7%
jessica 20581
 
1.6%
james 20039
 
1.5%
michael 20009
 
1.5%
david 19965
 
1.5%
jennifer 16940
 
1.3%
william 16371
 
1.3%
mary 16346
 
1.3%
john 16325
 
1.3%
Other values (342) 1101763
85.0%

Most occurring characters

ValueCountFrequency (%)
a 1007700
 
12.8%
e 860878
 
10.9%
i 618247
 
7.8%
n 614453
 
7.8%
r 607072
 
7.7%
l 388220
 
4.9%
h 344993
 
4.4%
s 324237
 
4.1%
t 311569
 
4.0%
o 268849
 
3.4%
Other values (39) 2538126
32.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 6587669
83.6%
Uppercase Letter 1296675
 
16.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 1007700
15.3%
e 860878
13.1%
i 618247
9.4%
n 614453
9.3%
r 607072
9.2%
l 388220
 
5.9%
h 344993
 
5.2%
s 324237
 
4.9%
t 311569
 
4.7%
o 268849
 
4.1%
Other values (16) 1241451
18.8%
Uppercase Letter
ValueCountFrequency (%)
J 218907
16.9%
M 144916
11.2%
S 114469
8.8%
A 112464
8.7%
C 106121
8.2%
D 86078
 
6.6%
K 85426
 
6.6%
R 70457
 
5.4%
T 66590
 
5.1%
L 62879
 
4.8%
Other values (13) 228368
17.6%

Most occurring scripts

ValueCountFrequency (%)
Latin 7884344
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 1007700
 
12.8%
e 860878
 
10.9%
i 618247
 
7.8%
n 614453
 
7.8%
r 607072
 
7.7%
l 388220
 
4.9%
h 344993
 
4.4%
s 324237
 
4.1%
t 311569
 
4.0%
o 268849
 
3.4%
Other values (39) 2538126
32.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 7884344
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 1007700
 
12.8%
e 860878
 
10.9%
i 618247
 
7.8%
n 614453
 
7.8%
r 607072
 
7.7%
l 388220
 
4.9%
h 344993
 
4.4%
s 324237
 
4.1%
t 311569
 
4.0%
o 268849
 
3.4%
Other values (39) 2538126
32.2%

last
Categorical

Distinct481
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size19.8 MiB
Smith
 
28794
Williams
 
23605
Davis
 
21910
Johnson
 
20034
Rodriguez
 
17394
Other values (476)
1184938 

Length

Max length11
Median length10
Mean length6.1111774
Min length2

Characters and Unicode

Total characters7924211
Distinct characters48
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowBanks
2nd rowGill
3rd rowSanchez
4th rowWhite
5th rowGarcia

Common Values

ValueCountFrequency (%)
Smith 28794
 
2.2%
Williams 23605
 
1.8%
Davis 21910
 
1.7%
Johnson 20034
 
1.5%
Rodriguez 17394
 
1.3%
Martinez 14805
 
1.1%
Jones 13976
 
1.1%
Lewis 12753
 
1.0%
Gonzalez 11799
 
0.9%
Miller 11698
 
0.9%
Other values (471) 1119907
86.4%

Length

2022-12-29T10:07:58.142661image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
smith 28794
 
2.2%
williams 23605
 
1.8%
davis 21910
 
1.7%
johnson 20034
 
1.5%
rodriguez 17394
 
1.3%
martinez 14805
 
1.1%
jones 13976
 
1.1%
lewis 12753
 
1.0%
gonzalez 11799
 
0.9%
miller 11698
 
0.9%
Other values (471) 1119907
86.4%

Most occurring characters

ValueCountFrequency (%)
e 786302
 
9.9%
r 658748
 
8.3%
a 648005
 
8.2%
n 609178
 
7.7%
o 583517
 
7.4%
l 489180
 
6.2%
s 487668
 
6.2%
i 435378
 
5.5%
t 288591
 
3.6%
h 228981
 
2.9%
Other values (38) 2708663
34.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 6627536
83.6%
Uppercase Letter 1296675
 
16.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 786302
11.9%
r 658748
9.9%
a 648005
9.8%
n 609178
9.2%
o 583517
8.8%
l 489180
 
7.4%
s 487668
 
7.4%
i 435378
 
6.6%
t 288591
 
4.4%
h 228981
 
3.5%
Other values (15) 1411988
21.3%
Uppercase Letter
ValueCountFrequency (%)
M 158701
12.2%
W 106490
 
8.2%
S 105221
 
8.1%
C 93308
 
7.2%
B 84092
 
6.5%
R 83194
 
6.4%
H 81444
 
6.3%
G 75241
 
5.8%
J 71781
 
5.5%
P 66087
 
5.1%
Other values (13) 371116
28.6%

Most occurring scripts

ValueCountFrequency (%)
Latin 7924211
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 786302
 
9.9%
r 658748
 
8.3%
a 648005
 
8.2%
n 609178
 
7.7%
o 583517
 
7.4%
l 489180
 
6.2%
s 487668
 
6.2%
i 435378
 
5.5%
t 288591
 
3.6%
h 228981
 
2.9%
Other values (38) 2708663
34.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 7924211
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 786302
 
9.9%
r 658748
 
8.3%
a 648005
 
8.2%
n 609178
 
7.7%
o 583517
 
7.4%
l 489180
 
6.2%
s 487668
 
6.2%
i 435378
 
5.5%
t 288591
 
3.6%
h 228981
 
2.9%
Other values (38) 2708663
34.2%

gender
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size19.8 MiB
F
709863 
M
586812 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1296675
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowF
2nd rowF
3rd rowM
4th rowM
5th rowM

Common Values

ValueCountFrequency (%)
F 709863
54.7%
M 586812
45.3%

Length

2022-12-29T10:07:58.277986image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2022-12-29T10:07:58.559760image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
f 709863
54.7%
m 586812
45.3%

Most occurring characters

ValueCountFrequency (%)
F 709863
54.7%
M 586812
45.3%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 1296675
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
F 709863
54.7%
M 586812
45.3%

Most occurring scripts

ValueCountFrequency (%)
Latin 1296675
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
F 709863
54.7%
M 586812
45.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1296675
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
F 709863
54.7%
M 586812
45.3%

street
Categorical

Distinct983
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size19.8 MiB
0069 Robin Brooks Apt. 695
 
3123
864 Reynolds Plains
 
3123
8172 Robertson Parkways Suite 072
 
3119
4664 Sanchez Common Suite 930
 
3117
8030 Beck Motorway
 
3113
Other values (978)
1281080 

Length

Max length35
Median length29
Mean length22.229027
Min length12

Characters and Unicode

Total characters28823823
Distinct characters62
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row561 Perry Cove
2nd row43039 Riley Greens Suite 393
3rd row594 White Dale Suite 530
4th row9443 Cynthia Court Apt. 038
5th row408 Bradley Rest

Common Values

ValueCountFrequency (%)
0069 Robin Brooks Apt. 695 3123
 
0.2%
864 Reynolds Plains 3123
 
0.2%
8172 Robertson Parkways Suite 072 3119
 
0.2%
4664 Sanchez Common Suite 930 3117
 
0.2%
8030 Beck Motorway 3113
 
0.2%
29606 Martinez Views Suite 653 3112
 
0.2%
1652 James Mews 3110
 
0.2%
854 Walker Dale Suite 488 3107
 
0.2%
40624 Rebecca Spurs 3106
 
0.2%
594 Berry Lights Apt. 392 3101
 
0.2%
Other values (973) 1265544
97.6%

Length

2022-12-29T10:07:58.712560image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
apt 327791
 
6.4%
suite 305467
 
5.9%
island 22954
 
0.4%
michael 18967
 
0.4%
common 17978
 
0.3%
station 17957
 
0.3%
islands 17917
 
0.3%
david 17476
 
0.3%
brooks 16991
 
0.3%
fields 16321
 
0.3%
Other values (1940) 4376722
84.9%

Most occurring characters

ValueCountFrequency (%)
3859866
 
13.4%
e 1792676
 
6.2%
a 1454190
 
5.0%
i 1296969
 
4.5%
t 1248091
 
4.3%
r 1103208
 
3.8%
n 1066149
 
3.7%
s 1034564
 
3.6%
l 889594
 
3.1%
o 875571
 
3.0%
Other values (52) 14202945
49.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 14413030
50.0%
Decimal Number 6996528
24.3%
Space Separator 3859866
 
13.4%
Uppercase Letter 3226608
 
11.2%
Other Punctuation 327791
 
1.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 1792676
12.4%
a 1454190
10.1%
i 1296969
9.0%
t 1248091
8.7%
r 1103208
 
7.7%
n 1066149
 
7.4%
s 1034564
 
7.2%
l 889594
 
6.2%
o 875571
 
6.1%
u 613916
 
4.3%
Other values (16) 3038102
21.1%
Uppercase Letter
ValueCountFrequency (%)
S 561924
17.4%
A 421707
13.1%
M 258180
 
8.0%
C 223839
 
6.9%
P 195864
 
6.1%
R 186303
 
5.8%
B 148676
 
4.6%
F 143149
 
4.4%
L 131665
 
4.1%
J 121164
 
3.8%
Other values (14) 834137
25.9%
Decimal Number
ValueCountFrequency (%)
5 748812
10.7%
3 739928
10.6%
2 734719
10.5%
7 703124
10.0%
1 693880
9.9%
8 692585
9.9%
6 677709
9.7%
0 677245
9.7%
4 669799
9.6%
9 658727
9.4%
Space Separator
ValueCountFrequency (%)
3859866
100.0%
Other Punctuation
ValueCountFrequency (%)
. 327791
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 17639638
61.2%
Common 11184185
38.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 1792676
 
10.2%
a 1454190
 
8.2%
i 1296969
 
7.4%
t 1248091
 
7.1%
r 1103208
 
6.3%
n 1066149
 
6.0%
s 1034564
 
5.9%
l 889594
 
5.0%
o 875571
 
5.0%
u 613916
 
3.5%
Other values (40) 6264710
35.5%
Common
ValueCountFrequency (%)
3859866
34.5%
5 748812
 
6.7%
3 739928
 
6.6%
2 734719
 
6.6%
7 703124
 
6.3%
1 693880
 
6.2%
8 692585
 
6.2%
6 677709
 
6.1%
0 677245
 
6.1%
4 669799
 
6.0%
Other values (2) 986518
 
8.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 28823823
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3859866
 
13.4%
e 1792676
 
6.2%
a 1454190
 
5.0%
i 1296969
 
4.5%
t 1248091
 
4.3%
r 1103208
 
3.8%
n 1066149
 
3.7%
s 1034564
 
3.6%
l 889594
 
3.1%
o 875571
 
3.0%
Other values (52) 14202945
49.3%

city
Categorical

Distinct894
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size19.8 MiB
Birmingham
 
5617
San Antonio
 
5130
Utica
 
5105
Phoenix
 
5075
Meridian
 
5060
Other values (889)
1270688 

Length

Max length25
Median length21
Mean length8.6522459
Min length3

Characters and Unicode

Total characters11219151
Distinct characters52
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMoravian Falls
2nd rowOrient
3rd rowMalad City
4th rowBoulder
5th rowDoe Hill

Common Values

ValueCountFrequency (%)
Birmingham 5617
 
0.4%
San Antonio 5130
 
0.4%
Utica 5105
 
0.4%
Phoenix 5075
 
0.4%
Meridian 5060
 
0.4%
Thomas 4634
 
0.4%
Conway 4613
 
0.4%
Cleveland 4604
 
0.4%
Warren 4599
 
0.4%
Houston 4168
 
0.3%
Other values (884) 1248070
96.3%

Length

2022-12-29T10:07:58.859525image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
city 21314
 
1.3%
west 19473
 
1.2%
north 14425
 
0.9%
saint 14363
 
0.9%
falls 12794
 
0.8%
new 11842
 
0.7%
mount 11375
 
0.7%
lake 11249
 
0.7%
san 10260
 
0.6%
springs 8727
 
0.5%
Other values (918) 1482445
91.6%

Most occurring characters

ValueCountFrequency (%)
e 1090254
 
9.7%
a 935089
 
8.3%
n 821831
 
7.3%
o 817806
 
7.3%
l 781662
 
7.0%
r 748921
 
6.7%
i 704285
 
6.3%
t 598490
 
5.3%
s 446306
 
4.0%
321592
 
2.9%
Other values (42) 3952915
35.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 9277246
82.7%
Uppercase Letter 1619290
 
14.4%
Space Separator 321592
 
2.9%
Dash Punctuation 1023
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 1090254
11.8%
a 935089
10.1%
n 821831
8.9%
o 817806
8.8%
l 781662
 
8.4%
r 748921
 
8.1%
i 704285
 
7.6%
t 598490
 
6.5%
s 446306
 
4.8%
d 309005
 
3.3%
Other values (15) 2023597
21.8%
Uppercase Letter
ValueCountFrequency (%)
C 156587
 
9.7%
M 147711
 
9.1%
S 136036
 
8.4%
B 133396
 
8.2%
H 115641
 
7.1%
W 95433
 
5.9%
P 92084
 
5.7%
L 86511
 
5.3%
R 79150
 
4.9%
A 74999
 
4.6%
Other values (15) 501742
31.0%
Space Separator
ValueCountFrequency (%)
321592
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1023
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 10896536
97.1%
Common 322615
 
2.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 1090254
 
10.0%
a 935089
 
8.6%
n 821831
 
7.5%
o 817806
 
7.5%
l 781662
 
7.2%
r 748921
 
6.9%
i 704285
 
6.5%
t 598490
 
5.5%
s 446306
 
4.1%
d 309005
 
2.8%
Other values (40) 3642887
33.4%
Common
ValueCountFrequency (%)
321592
99.7%
- 1023
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 11219151
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 1090254
 
9.7%
a 935089
 
8.3%
n 821831
 
7.3%
o 817806
 
7.3%
l 781662
 
7.0%
r 748921
 
6.7%
i 704285
 
6.3%
t 598490
 
5.3%
s 446306
 
4.0%
321592
 
2.9%
Other values (42) 3952915
35.2%

state
Categorical

HIGH CARDINALITY  HIGH CORRELATION 

Distinct51
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size19.8 MiB
TX
94876 
NY
 
83501
PA
 
79847
CA
 
56360
OH
 
46480
Other values (46)
935611 

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters2593350
Distinct characters24
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNC
2nd rowWA
3rd rowID
4th rowMT
5th rowVA

Common Values

ValueCountFrequency (%)
TX 94876
 
7.3%
NY 83501
 
6.4%
PA 79847
 
6.2%
CA 56360
 
4.3%
OH 46480
 
3.6%
MI 46154
 
3.6%
IL 43252
 
3.3%
FL 42671
 
3.3%
AL 40989
 
3.2%
MO 38403
 
3.0%
Other values (41) 724142
55.8%

Length

2022-12-29T10:07:58.998561image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
tx 94876
 
7.3%
ny 83501
 
6.4%
pa 79847
 
6.2%
ca 56360
 
4.3%
oh 46480
 
3.6%
mi 46154
 
3.6%
il 43252
 
3.3%
fl 42671
 
3.3%
al 40989
 
3.2%
mo 38403
 
3.0%
Other values (41) 724142
55.8%

Most occurring characters

ValueCountFrequency (%)
A 355776
13.7%
N 284464
 
11.0%
M 220694
 
8.5%
I 181993
 
7.0%
T 154353
 
6.0%
L 147877
 
5.7%
O 144031
 
5.6%
C 141011
 
5.4%
Y 131298
 
5.1%
X 94876
 
3.7%
Other values (14) 736977
28.4%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 2593350
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A 355776
13.7%
N 284464
 
11.0%
M 220694
 
8.5%
I 181993
 
7.0%
T 154353
 
6.0%
L 147877
 
5.7%
O 144031
 
5.6%
C 141011
 
5.4%
Y 131298
 
5.1%
X 94876
 
3.7%
Other values (14) 736977
28.4%

Most occurring scripts

ValueCountFrequency (%)
Latin 2593350
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
A 355776
13.7%
N 284464
 
11.0%
M 220694
 
8.5%
I 181993
 
7.0%
T 154353
 
6.0%
L 147877
 
5.7%
O 144031
 
5.6%
C 141011
 
5.4%
Y 131298
 
5.1%
X 94876
 
3.7%
Other values (14) 736977
28.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2593350
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A 355776
13.7%
N 284464
 
11.0%
M 220694
 
8.5%
I 181993
 
7.0%
T 154353
 
6.0%
L 147877
 
5.7%
O 144031
 
5.6%
C 141011
 
5.4%
Y 131298
 
5.1%
X 94876
 
3.7%
Other values (14) 736977
28.4%

zip
Real number (ℝ)

Distinct970
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean48800.671
Minimum1257
Maximum99783
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size19.8 MiB
2022-12-29T10:07:59.159553image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum1257
5-th percentile7208
Q126237
median48174
Q372042
95-th percentile94569
Maximum99783
Range98526
Interquartile range (IQR)45805

Descriptive statistics

Standard deviation26893.222
Coefficient of variation (CV)0.55108305
Kurtosis-1.0964493
Mean48800.671
Median Absolute Deviation (MAD)23068
Skewness0.079680758
Sum6.327861 × 1010
Variance7.2324542 × 108
MonotonicityNot monotonic
2022-12-29T10:07:59.332555image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
73754 3646
 
0.3%
34112 3613
 
0.3%
48088 3597
 
0.3%
82514 3527
 
0.3%
49628 3123
 
0.2%
15484 3123
 
0.2%
85173 3119
 
0.2%
29819 3117
 
0.2%
38761 3113
 
0.2%
5461 3112
 
0.2%
Other values (960) 1263585
97.4%
ValueCountFrequency (%)
1257 2023
0.2%
1330 1031
 
0.1%
1535 515
 
< 0.1%
1545 1024
 
0.1%
1612 519
 
< 0.1%
1843 2597
0.2%
1844 2058
0.2%
2180 519
 
< 0.1%
2630 2090
0.2%
2908 550
 
< 0.1%
ValueCountFrequency (%)
99783 1568
0.1%
99747 12
 
< 0.1%
99746 540
 
< 0.1%
99323 2572
0.2%
99160 3030
0.2%
99116 15
 
< 0.1%
99113 1047
 
0.1%
99033 2458
0.2%
98836 524
 
< 0.1%
98665 500
 
< 0.1%

lat
Real number (ℝ)

Distinct968
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean38.537622
Minimum20.0271
Maximum66.6933
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size19.8 MiB
2022-12-29T10:07:59.505560image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum20.0271
5-th percentile29.8826
Q134.6205
median39.3543
Q341.9404
95-th percentile45.8433
Maximum66.6933
Range46.6662
Interquartile range (IQR)7.3199

Descriptive statistics

Standard deviation5.0758084
Coefficient of variation (CV)0.13171047
Kurtosis0.81296795
Mean38.537622
Median Absolute Deviation (MAD)3.3597
Skewness-0.18602768
Sum49970771
Variance25.763831
MonotonicityNot monotonic
2022-12-29T10:07:59.674559image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
36.385 3646
 
0.3%
26.1184 3613
 
0.3%
42.5164 3597
 
0.3%
43.0048 3527
 
0.3%
39.8936 3123
 
0.2%
44.5995 3123
 
0.2%
33.2887 3119
 
0.2%
34.0326 3117
 
0.2%
33.4783 3113
 
0.2%
44.3346 3112
 
0.2%
Other values (958) 1263585
97.4%
ValueCountFrequency (%)
20.0271 1527
0.1%
20.0827 1032
 
0.1%
24.6557 2584
0.2%
26.1184 3613
0.3%
26.3304 542
 
< 0.1%
26.3771 518
 
< 0.1%
26.4215 3038
0.2%
26.4722 2524
0.2%
26.529 1549
0.1%
26.6939 1027
 
0.1%
ValueCountFrequency (%)
66.6933 12
 
< 0.1%
65.6899 540
 
< 0.1%
64.7556 1568
0.1%
48.8878 3030
0.2%
48.8856 2066
0.2%
48.8328 1533
0.1%
48.6669 1047
 
0.1%
48.6031 2973
0.2%
48.4786 2038
0.2%
48.34 3088
0.2%

long
Real number (ℝ)

Distinct969
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-90.226335
Minimum-165.6723
Maximum-67.9503
Zeros0
Zeros (%)0.0%
Negative1296675
Negative (%)100.0%
Memory size19.8 MiB
2022-12-29T10:07:59.859529image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum-165.6723
5-th percentile-119.0825
Q1-96.798
median-87.4769
Q3-80.158
95-th percentile-73.5112
Maximum-67.9503
Range97.722
Interquartile range (IQR)16.64

Descriptive statistics

Standard deviation13.759077
Coefficient of variation (CV)-0.15249513
Kurtosis1.8558923
Mean-90.226335
Median Absolute Deviation (MAD)8.1527
Skewness-1.1501077
Sum-1.1699423 × 108
Variance189.3122
MonotonicityNot monotonic
2022-12-29T10:08:00.038525image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-98.0727 3646
 
0.3%
-81.7361 3613
 
0.3%
-82.9832 3597
 
0.3%
-108.8964 3527
 
0.3%
-79.7856 3123
 
0.2%
-86.2141 3123
 
0.2%
-111.0985 3119
 
0.2%
-82.2027 3117
 
0.2%
-90.5142 3113
 
0.2%
-73.098 3112
 
0.2%
Other values (959) 1263585
97.4%
ValueCountFrequency (%)
-165.6723 1568
0.1%
-156.292 540
 
< 0.1%
-155.488 1032
0.1%
-155.3697 1527
0.1%
-153.994 12
 
< 0.1%
-124.4409 1043
0.1%
-124.2174 1547
0.1%
-124.1587 1031
0.1%
-124.1437 1526
0.1%
-123.9743 2036
0.2%
ValueCountFrequency (%)
-67.9503 2080
0.2%
-68.5565 1014
 
0.1%
-69.2675 519
 
< 0.1%
-69.4828 2050
0.2%
-69.9576 537
 
< 0.1%
-69.9656 3107
0.2%
-70.1031 9
 
< 0.1%
-70.239 1036
 
0.1%
-70.3001 2090
0.2%
-70.3457 1527
0.1%

city_pop
Real number (ℝ)

Distinct879
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean88824.441
Minimum23
Maximum2906700
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size19.8 MiB
2022-12-29T10:08:00.206560image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum23
5-th percentile139
Q1743
median2456
Q320328
95-th percentile525713
Maximum2906700
Range2906677
Interquartile range (IQR)19585

Descriptive statistics

Standard deviation301956.36
Coefficient of variation (CV)3.3994738
Kurtosis37.614519
Mean88824.441
Median Absolute Deviation (MAD)2198
Skewness5.5938531
Sum1.1517643 × 1011
Variance9.1177644 × 1010
MonotonicityNot monotonic
2022-12-29T10:08:00.389524image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
606 5496
 
0.4%
1595797 5130
 
0.4%
1312922 5075
 
0.4%
1766 4574
 
0.4%
241 4533
 
0.3%
2906700 4168
 
0.3%
276002 4155
 
0.3%
302 4147
 
0.3%
910148 4073
 
0.3%
198 4067
 
0.3%
Other values (869) 1251257
96.5%
ValueCountFrequency (%)
23 2049
0.2%
37 1013
 
0.1%
43 2034
0.2%
46 3040
0.2%
47 511
 
< 0.1%
49 1054
 
0.1%
51 1016
 
0.1%
52 518
 
< 0.1%
53 2610
0.2%
60 1045
 
0.1%
ValueCountFrequency (%)
2906700 4168
0.3%
2504700 2033
 
0.2%
2383912 521
 
< 0.1%
1595797 5130
0.4%
1577385 2563
0.2%
1526206 3517
0.3%
1417793 8
 
< 0.1%
1382480 2056
0.2%
1312922 5075
0.4%
1263321 3629
0.3%

job
Categorical

Distinct494
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size19.8 MiB
Film/video editor
 
9779
Exhibition designer
 
9199
Naval architect
 
8684
Surveyor, land/geomatics
 
8680
Materials engineer
 
8270
Other values (489)
1252063 

Length

Max length59
Median length38
Mean length20.227102
Min length3

Characters and Unicode

Total characters26227978
Distinct characters53
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowPsychologist, counselling
2nd rowSpecial educational needs teacher
3rd rowNature conservation officer
4th rowPatent attorney
5th rowDance movement psychotherapist

Common Values

ValueCountFrequency (%)
Film/video editor 9779
 
0.8%
Exhibition designer 9199
 
0.7%
Naval architect 8684
 
0.7%
Surveyor, land/geomatics 8680
 
0.7%
Materials engineer 8270
 
0.6%
Designer, ceramics/pottery 8225
 
0.6%
Systems developer 7700
 
0.6%
IT trainer 7679
 
0.6%
Financial adviser 7659
 
0.6%
Environmental consultant 7547
 
0.6%
Other values (484) 1213253
93.6%

Length

2022-12-29T10:08:00.580563image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
engineer 131756
 
4.6%
officer 110915
 
3.9%
manager 61124
 
2.1%
scientist 55878
 
1.9%
designer 52218
 
1.8%
surveyor 49062
 
1.7%
teacher 38126
 
1.3%
psychologist 32600
 
1.1%
research 29754
 
1.0%
editor 28725
 
1.0%
Other values (456) 2289024
79.5%

Most occurring characters

ValueCountFrequency (%)
e 2803032
 
10.7%
i 2386346
 
9.1%
r 2198669
 
8.4%
a 1813638
 
6.9%
t 1782302
 
6.8%
n 1764769
 
6.7%
1582507
 
6.0%
o 1491775
 
5.7%
s 1444701
 
5.5%
c 1323152
 
5.0%
Other values (43) 7637087
29.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 22784440
86.9%
Space Separator 1582507
 
6.0%
Uppercase Letter 1369269
 
5.2%
Other Punctuation 443484
 
1.7%
Close Punctuation 24139
 
0.1%
Open Punctuation 24139
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 2803032
12.3%
i 2386346
10.5%
r 2198669
9.6%
a 1813638
 
8.0%
t 1782302
 
7.8%
n 1764769
 
7.7%
o 1491775
 
6.5%
s 1444701
 
6.3%
c 1323152
 
5.8%
l 999624
 
4.4%
Other values (16) 4776432
21.0%
Uppercase Letter
ValueCountFrequency (%)
C 156704
11.4%
E 145426
10.6%
P 143111
10.5%
S 137500
10.0%
T 113148
 
8.3%
M 89545
 
6.5%
A 88466
 
6.5%
F 68651
 
5.0%
D 58034
 
4.2%
R 55841
 
4.1%
Other values (11) 312843
22.8%
Other Punctuation
ValueCountFrequency (%)
, 312210
70.4%
/ 123567
 
27.9%
' 7707
 
1.7%
Space Separator
ValueCountFrequency (%)
1582507
100.0%
Close Punctuation
ValueCountFrequency (%)
) 24139
100.0%
Open Punctuation
ValueCountFrequency (%)
( 24139
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 24153709
92.1%
Common 2074269
 
7.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 2803032
11.6%
i 2386346
 
9.9%
r 2198669
 
9.1%
a 1813638
 
7.5%
t 1782302
 
7.4%
n 1764769
 
7.3%
o 1491775
 
6.2%
s 1444701
 
6.0%
c 1323152
 
5.5%
l 999624
 
4.1%
Other values (37) 6145701
25.4%
Common
ValueCountFrequency (%)
1582507
76.3%
, 312210
 
15.1%
/ 123567
 
6.0%
) 24139
 
1.2%
( 24139
 
1.2%
' 7707
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 26227978
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 2803032
 
10.7%
i 2386346
 
9.1%
r 2198669
 
8.4%
a 1813638
 
6.9%
t 1782302
 
6.8%
n 1764769
 
6.7%
1582507
 
6.0%
o 1491775
 
5.7%
s 1444701
 
5.5%
c 1323152
 
5.0%
Other values (43) 7637087
29.1%

dob
Categorical

Distinct968
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size19.8 MiB
1977-03-23
 
5636
1981-08-29
 
4636
1988-09-15
 
4623
1955-05-06
 
3661
1995-07-12
 
3123
Other values (963)
1274996 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters12966750
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1988-03-09
2nd row1978-06-21
3rd row1962-01-19
4th row1967-01-12
5th row1986-03-28

Common Values

ValueCountFrequency (%)
1977-03-23 5636
 
0.4%
1981-08-29 4636
 
0.4%
1988-09-15 4623
 
0.4%
1955-05-06 3661
 
0.3%
1995-07-12 3123
 
0.2%
1983-07-25 3123
 
0.2%
1987-10-28 3119
 
0.2%
1984-06-03 3117
 
0.2%
1999-03-05 3113
 
0.2%
1998-03-19 3112
 
0.2%
Other values (958) 1259412
97.1%

Length

2022-12-29T10:08:00.739559image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
1977-03-23 5636
 
0.4%
1981-08-29 4636
 
0.4%
1988-09-15 4623
 
0.4%
1955-05-06 3661
 
0.3%
1995-07-12 3123
 
0.2%
1983-07-25 3123
 
0.2%
1987-10-28 3119
 
0.2%
1984-06-03 3117
 
0.2%
1999-03-05 3113
 
0.2%
1998-03-19 3112
 
0.2%
Other values (958) 1259412
97.1%

Most occurring characters

ValueCountFrequency (%)
- 2593350
20.0%
1 2482923
19.1%
9 1846679
14.2%
0 1791076
13.8%
2 903212
 
7.0%
7 664815
 
5.1%
8 645604
 
5.0%
6 548041
 
4.2%
5 536159
 
4.1%
3 484324
 
3.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 10373400
80.0%
Dash Punctuation 2593350
 
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 2482923
23.9%
9 1846679
17.8%
0 1791076
17.3%
2 903212
 
8.7%
7 664815
 
6.4%
8 645604
 
6.2%
6 548041
 
5.3%
5 536159
 
5.2%
3 484324
 
4.7%
4 470567
 
4.5%
Dash Punctuation
ValueCountFrequency (%)
- 2593350
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 12966750
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
- 2593350
20.0%
1 2482923
19.1%
9 1846679
14.2%
0 1791076
13.8%
2 903212
 
7.0%
7 664815
 
5.1%
8 645604
 
5.0%
6 548041
 
4.2%
5 536159
 
4.1%
3 484324
 
3.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 12966750
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 2593350
20.0%
1 2482923
19.1%
9 1846679
14.2%
0 1791076
13.8%
2 903212
 
7.0%
7 664815
 
5.1%
8 645604
 
5.0%
6 548041
 
4.2%
5 536159
 
4.1%
3 484324
 
3.7%

trans_num
Categorical

HIGH CARDINALITY  UNIFORM  UNIQUE 

Distinct1296675
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size19.8 MiB
0b242abb623afc578575680df30655b9
 
1
c85864e7e7cf0be6d1b8597977b8afea
 
1
1a8a2a05638a5503cc6bb8d5735efcc1
 
1
4556eaf1f7def06eb500325cde4d054e
 
1
5e915d9f88bd09cee9655a470d9bc0bd
 
1
Other values (1296670)
1296670 

Length

Max length32
Median length32
Mean length32
Min length32

Characters and Unicode

Total characters41493600
Distinct characters16
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1296675 ?
Unique (%)100.0%

Sample

1st row0b242abb623afc578575680df30655b9
2nd row1f76529f8574734946361c461b024d99
3rd rowa1a22d70485983eac12b5b88dad1cf95
4th row6b849c168bdad6f867558c3793159a81
5th rowa41d7549acf90789359a9aa5346dcb46

Common Values

ValueCountFrequency (%)
0b242abb623afc578575680df30655b9 1
 
< 0.1%
c85864e7e7cf0be6d1b8597977b8afea 1
 
< 0.1%
1a8a2a05638a5503cc6bb8d5735efcc1 1
 
< 0.1%
4556eaf1f7def06eb500325cde4d054e 1
 
< 0.1%
5e915d9f88bd09cee9655a470d9bc0bd 1
 
< 0.1%
4e0080ea32b67dc251ea824d55ba1f6f 1
 
< 0.1%
541a9a3880dae40c9e7778117adbc89f 1
 
< 0.1%
2c602fbe0404b65cc431b059ed167518 1
 
< 0.1%
6f9d22d80c0c48e238ecc484d1c64a49 1
 
< 0.1%
c766663cba6e1a1df3623e4f9d6472de 1
 
< 0.1%
Other values (1296665) 1296665
> 99.9%

Length

2022-12-29T10:08:01.012526image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
0b242abb623afc578575680df30655b9 1
 
< 0.1%
c1d9a7ddb1e34639fe82758de97f4abf 1
 
< 0.1%
189a841a0a8ba03058526bcfe566aab5 1
 
< 0.1%
83ec1cc84142af6e2acf10c44949e720 1
 
< 0.1%
6d294ed2cc447d2c71c7171a3d54967c 1
 
< 0.1%
fc28024ce480f8ef21a32d64c93a29f5 1
 
< 0.1%
7bb25a43205191eb7344282b88fc54d3 1
 
< 0.1%
3b9014ea8fb80bd65de0b1463b00b00e 1
 
< 0.1%
3c74776e558f1499a7824b556e474b1d 1
 
< 0.1%
413636e759663f264aae1819a4d4f231 1
 
< 0.1%
Other values (1296665) 1296665
> 99.9%

Most occurring characters

ValueCountFrequency (%)
2 2596593
 
6.3%
9 2596375
 
6.3%
7 2595084
 
6.3%
4 2594676
 
6.3%
a 2594103
 
6.3%
d 2593816
 
6.3%
3 2593713
 
6.3%
f 2593666
 
6.3%
5 2593098
 
6.2%
e 2592759
 
6.2%
Other values (6) 15549717
37.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 25937791
62.5%
Lowercase Letter 15555809
37.5%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2 2596593
10.0%
9 2596375
10.0%
7 2595084
10.0%
4 2594676
10.0%
3 2593713
10.0%
5 2593098
10.0%
1 2592577
10.0%
8 2592342
10.0%
0 2591678
10.0%
6 2591655
10.0%
Lowercase Letter
ValueCountFrequency (%)
a 2594103
16.7%
d 2593816
16.7%
f 2593666
16.7%
e 2592759
16.7%
c 2592326
16.7%
b 2589139
16.6%

Most occurring scripts

ValueCountFrequency (%)
Common 25937791
62.5%
Latin 15555809
37.5%

Most frequent character per script

Common
ValueCountFrequency (%)
2 2596593
10.0%
9 2596375
10.0%
7 2595084
10.0%
4 2594676
10.0%
3 2593713
10.0%
5 2593098
10.0%
1 2592577
10.0%
8 2592342
10.0%
0 2591678
10.0%
6 2591655
10.0%
Latin
ValueCountFrequency (%)
a 2594103
16.7%
d 2593816
16.7%
f 2593666
16.7%
e 2592759
16.7%
c 2592326
16.7%
b 2589139
16.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 41493600
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2 2596593
 
6.3%
9 2596375
 
6.3%
7 2595084
 
6.3%
4 2594676
 
6.3%
a 2594103
 
6.3%
d 2593816
 
6.3%
3 2593713
 
6.3%
f 2593666
 
6.3%
5 2593098
 
6.2%
e 2592759
 
6.2%
Other values (6) 15549717
37.5%

unix_time
Real number (ℝ)

Distinct1274823
Distinct (%)98.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.3492436 × 109
Minimum1.325376 × 109
Maximum1.3718168 × 109
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size19.8 MiB
2022-12-29T10:08:01.201527image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum1.325376 × 109
5-th percentile1.328672 × 109
Q11.3387507 × 109
median1.3492497 × 109
Q31.3593854 × 109
95-th percentile1.3698306 × 109
Maximum1.3718168 × 109
Range46440799
Interquartile range (IQR)20634633

Descriptive statistics

Standard deviation12841278
Coefficient of variation (CV)0.0095173904
Kurtosis-1.0875405
Mean1.3492436 × 109
Median Absolute Deviation (MAD)10358807
Skewness0.0033779498
Sum1.7495305 × 1015
Variance1.6489843 × 1014
MonotonicityIncreasing
2022-12-29T10:08:01.790069image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1370177227 4
 
< 0.1%
1335110521 4
 
< 0.1%
1370050667 4
 
< 0.1%
1367602155 3
 
< 0.1%
1364686521 3
 
< 0.1%
1369587838 3
 
< 0.1%
1337306743 3
 
< 0.1%
1343668520 3
 
< 0.1%
1341944714 3
 
< 0.1%
1340650327 3
 
< 0.1%
Other values (1274813) 1296642
> 99.9%
ValueCountFrequency (%)
1325376018 1
< 0.1%
1325376044 1
< 0.1%
1325376051 1
< 0.1%
1325376076 1
< 0.1%
1325376186 1
< 0.1%
1325376248 1
< 0.1%
1325376282 1
< 0.1%
1325376308 1
< 0.1%
1325376318 1
< 0.1%
1325376361 1
< 0.1%
ValueCountFrequency (%)
1371816817 1
< 0.1%
1371816816 1
< 0.1%
1371816752 1
< 0.1%
1371816739 1
< 0.1%
1371816728 1
< 0.1%
1371816696 1
< 0.1%
1371816683 1
< 0.1%
1371816656 1
< 0.1%
1371816562 1
< 0.1%
1371816522 1
< 0.1%

merch_lat
Real number (ℝ)

Distinct1247805
Distinct (%)96.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean38.537338
Minimum19.027785
Maximum67.510267
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size19.8 MiB
2022-12-29T10:08:01.964070image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum19.027785
5-th percentile29.751653
Q134.733572
median39.36568
Q341.957164
95-th percentile46.00353
Maximum67.510267
Range48.482482
Interquartile range (IQR)7.223592

Descriptive statistics

Standard deviation5.1097884
Coefficient of variation (CV)0.13259318
Kurtosis0.79599391
Mean38.537338
Median Absolute Deviation (MAD)3.397536
Skewness-0.18191543
Sum49970403
Variance26.109937
MonotonicityNot monotonic
2022-12-29T10:08:02.133033image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
41.305966 4
 
< 0.1%
41.937796 4
 
< 0.1%
42.265012 4
 
< 0.1%
41.301611 4
 
< 0.1%
34.134994 4
 
< 0.1%
37.669788 4
 
< 0.1%
39.348185 4
 
< 0.1%
32.64469 4
 
< 0.1%
42.749184 4
 
< 0.1%
38.050673 4
 
< 0.1%
Other values (1247795) 1296635
> 99.9%
ValueCountFrequency (%)
19.027785 1
< 0.1%
19.027804 1
< 0.1%
19.029798 1
< 0.1%
19.031242 1
< 0.1%
19.032277 1
< 0.1%
19.033288 1
< 0.1%
19.034282 1
< 0.1%
19.034687 1
< 0.1%
19.035472 1
< 0.1%
19.036312 1
< 0.1%
ValueCountFrequency (%)
67.510267 1
< 0.1%
67.441518 1
< 0.1%
67.397018 1
< 0.1%
67.188111 1
< 0.1%
67.064277 1
< 0.1%
66.835174 1
< 0.1%
66.682905 1
< 0.1%
66.67355 1
< 0.1%
66.664673 1
< 0.1%
66.659242 1
< 0.1%

merch_long
Real number (ℝ)

Distinct1275745
Distinct (%)98.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-90.226465
Minimum-166.67124
Maximum-66.950902
Zeros0
Zeros (%)0.0%
Negative1296675
Negative (%)100.0%
Memory size19.8 MiB
2022-12-29T10:08:02.311069image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum-166.67124
5-th percentile-119.33009
Q1-96.897276
median-87.438392
Q3-80.236796
95-th percentile-73.354218
Maximum-66.950902
Range99.72034
Interquartile range (IQR)16.660479

Descriptive statistics

Standard deviation13.771091
Coefficient of variation (CV)-0.15262806
Kurtosis1.8484792
Mean-90.226465
Median Absolute Deviation (MAD)8.227889
Skewness-1.1469599
Sum-1.169944 × 108
Variance189.64294
MonotonicityNot monotonic
2022-12-29T10:08:02.491069image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-87.116414 4
 
< 0.1%
-81.219189 4
 
< 0.1%
-74.618269 4
 
< 0.1%
-85.326323 3
 
< 0.1%
-84.890305 3
 
< 0.1%
-88.49309 3
 
< 0.1%
-84.100102 3
 
< 0.1%
-97.527227 3
 
< 0.1%
-85.3444 3
 
< 0.1%
-86.037494 3
 
< 0.1%
Other values (1275735) 1296642
> 99.9%
ValueCountFrequency (%)
-166.671242 1
< 0.1%
-166.670132 1
< 0.1%
-166.669638 1
< 0.1%
-166.666179 1
< 0.1%
-166.664828 1
< 0.1%
-166.662888 1
< 0.1%
-166.661968 1
< 0.1%
-166.659277 1
< 0.1%
-166.657834 1
< 0.1%
-166.657174 1
< 0.1%
ValueCountFrequency (%)
-66.950902 1
< 0.1%
-66.955996 1
< 0.1%
-66.95654 1
< 0.1%
-66.958659 1
< 0.1%
-66.958751 1
< 0.1%
-66.959178 1
< 0.1%
-66.961923 1
< 0.1%
-66.962913 1
< 0.1%
-66.963918 1
< 0.1%
-66.963975 1
< 0.1%

is_fraud
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size19.8 MiB
0
1289169 
1
 
7506

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1296675
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 1289169
99.4%
1 7506
 
0.6%

Length

2022-12-29T10:08:02.638032image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2022-12-29T10:08:02.777681image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
0 1289169
99.4%
1 7506
 
0.6%

Most occurring characters

ValueCountFrequency (%)
0 1289169
99.4%
1 7506
 
0.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1296675
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 1289169
99.4%
1 7506
 
0.6%

Most occurring scripts

ValueCountFrequency (%)
Common 1296675
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 1289169
99.4%
1 7506
 
0.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1296675
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 1289169
99.4%
1 7506
 
0.6%

Interactions

2022-12-29T10:07:32.674856image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:06:20.330194image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:06:26.609679image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:06:32.501846image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:06:39.826847image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:06:54.509577image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:07:03.133832image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:07:10.652802image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:07:17.851783image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:07:25.180818image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:07:33.429844image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:06:21.524750image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:06:27.135712image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:06:33.186882image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:06:40.873873image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:06:55.299624image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:07:03.872833image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:07:11.341834image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:07:18.543339image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:07:25.993818image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:07:34.165821image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:06:22.061718image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:06:27.669063image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:06:33.801883image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:06:43.108946image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:06:56.043151image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:07:04.622836image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:07:12.030832image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:07:19.248747image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:07:26.726855image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:07:34.922823image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:06:22.672751image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:06:28.224066image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:06:34.408848image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:06:44.198941image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:06:57.688772image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:07:05.369800image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:07:12.811799image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:07:19.973754image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:07:27.441863image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:07:35.758818image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:06:23.227367image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:06:28.817030image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:06:35.147882image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:06:45.978639image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:06:58.442806image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:07:06.200797image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:07:13.532995image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:07:20.690838image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:07:28.190850image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:07:36.887708image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:06:23.804403image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:06:29.390030image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:06:35.786886image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:06:47.347643image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:06:59.201773image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:07:06.962801image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:07:14.236238image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:07:21.408818image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:07:28.952855image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:07:37.598730image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:06:24.333712image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:06:30.005847image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:06:36.364862image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:06:48.911651image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:06:59.981797image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:07:07.706797image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:07:14.925074image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:07:22.139823image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:07:29.675854image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:07:38.355216image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:06:24.893712image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:06:30.588881image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:06:37.079851image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:06:50.764490image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:07:00.809770image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:07:08.436834image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:07:15.721751image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:07:22.842857image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:07:30.394852image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:07:39.137248image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:06:25.495712image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:06:31.231881image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:06:38.046850image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:06:52.226058image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:07:01.579770image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:07:09.218798image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:07:16.447748image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:07:23.568819image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:07:31.151855image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:07:39.932253image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:06:26.057713image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:06:31.866848image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:06:39.031849image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:06:53.723573image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:07:02.329263image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:07:09.967798image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:07:17.155783image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:07:24.376818image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-29T10:07:31.920855image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Correlations

2022-12-29T10:08:03.008679image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Unnamed: 0cc_numamtziplatlongcity_popunix_timemerch_latmerch_longcategorygenderstateis_fraud
Unnamed: 01.0000.0020.0010.0010.001-0.001-0.0031.0000.001-0.0010.0010.0000.0040.019
cc_num0.0021.000-0.0010.013-0.004-0.0130.0490.002-0.004-0.0130.0090.0510.2380.006
amt0.001-0.0011.0000.0010.012-0.000-0.0240.0010.0120.0000.0200.0000.0040.000
zip0.0010.0130.0011.000-0.162-0.959-0.0400.001-0.162-0.9570.0110.1190.9680.005
lat0.001-0.0040.012-0.1621.0000.106-0.2650.0010.9910.1050.0110.1010.8000.008
long-0.001-0.013-0.000-0.9590.1061.0000.087-0.0010.1060.9980.0090.0910.9230.006
city_pop-0.0030.049-0.024-0.040-0.2650.0871.000-0.003-0.2640.0860.0140.0890.3130.004
unix_time1.0000.0020.0010.0010.001-0.001-0.0031.0000.001-0.0010.0010.0000.0040.018
merch_lat0.001-0.0040.012-0.1620.9910.106-0.2640.0011.0000.1040.0110.1030.8120.008
merch_long-0.001-0.0130.000-0.9570.1050.9980.086-0.0010.1041.0000.0090.0820.8850.005
category0.0010.0090.0200.0110.0110.0090.0140.0010.0110.0091.0000.0540.0190.071
gender0.0000.0510.0000.1190.1010.0910.0890.0000.1030.0820.0541.0000.2560.008
state0.0040.2380.0040.9680.8000.9230.3130.0040.8120.8850.0190.2561.0000.037
is_fraud0.0190.0060.0000.0050.0080.0060.0040.0180.0080.0050.0710.0080.0371.000

Missing values

2022-12-29T10:07:42.137738image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
A simple visualization of nullity by column.
2022-12-29T10:07:46.501007image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

Unnamed: 0trans_date_trans_timecc_nummerchantcategoryamtfirstlastgenderstreetcitystateziplatlongcity_popjobdobtrans_numunix_timemerch_latmerch_longis_fraud
002019-01-01 00:00:182703186189652095fraud_Rippin, Kub and Mannmisc_net4.97JenniferBanksF561 Perry CoveMoravian FallsNC2865436.0788-81.17813495Psychologist, counselling1988-03-090b242abb623afc578575680df30655b9132537601836.011293-82.0483150
112019-01-01 00:00:44630423337322fraud_Heller, Gutmann and Ziemegrocery_pos107.23StephanieGillF43039 Riley Greens Suite 393OrientWA9916048.8878-118.2105149Special educational needs teacher1978-06-211f76529f8574734946361c461b024d99132537604449.159047-118.1864620
222019-01-01 00:00:5138859492057661fraud_Lind-Buckridgeentertainment220.11EdwardSanchezM594 White Dale Suite 530Malad CityID8325242.1808-112.26204154Nature conservation officer1962-01-19a1a22d70485983eac12b5b88dad1cf95132537605143.150704-112.1544810
332019-01-01 00:01:163534093764340240fraud_Kutch, Hermiston and Farrellgas_transport45.00JeremyWhiteM9443 Cynthia Court Apt. 038BoulderMT5963246.2306-112.11381939Patent attorney1967-01-126b849c168bdad6f867558c3793159a81132537607647.034331-112.5610710
442019-01-01 00:03:06375534208663984fraud_Keeling-Cristmisc_pos41.96TylerGarciaM408 Bradley RestDoe HillVA2443338.4207-79.462999Dance movement psychotherapist1986-03-28a41d7549acf90789359a9aa5346dcb46132537618638.674999-78.6324590
552019-01-01 00:04:084767265376804500fraud_Stroman, Hudson and Erdmangas_transport94.63JenniferConnerF4655 David IslandDublinPA1891740.3750-75.20452158Transport planner1961-06-19189a841a0a8ba03058526bcfe566aab5132537624840.653382-76.1526670
662019-01-01 00:04:4230074693890476fraud_Rowe-Vandervortgrocery_net44.54KelseyRichardsF889 Sarah Station Suite 624HolcombKS6785137.9931-100.98932691Arboriculturist1993-08-1683ec1cc84142af6e2acf10c44949e720132537628237.162705-100.1533700
772019-01-01 00:05:086011360759745864fraud_Corwin-Collinsgas_transport71.65StevenWilliamsM231 Flores Pass Suite 720EdinburgVA2282438.8432-78.60036018Designer, multimedia1947-08-216d294ed2cc447d2c71c7171a3d54967c132537630838.948089-78.5402960
882019-01-01 00:05:184922710831011201fraud_Herzog Ltdmisc_pos4.27HeatherChaseF6888 Hicks Stream Suite 954ManorPA1566540.3359-79.66071472Public affairs consultant1941-03-07fc28024ce480f8ef21a32d64c93a29f5132537631840.351813-79.9581460
992019-01-01 00:06:012720830304681674fraud_Schoen, Kuphal and Nitzschegrocery_pos198.39MelissaAguilarF21326 Taylor Squares Suite 708ClarksvilleTN3704036.5220-87.3490151785Pathologist1974-03-283b9014ea8fb80bd65de0b1463b00b00e132537636137.179198-87.4853810
Unnamed: 0trans_date_trans_timecc_nummerchantcategoryamtfirstlastgenderstreetcitystateziplatlongcity_popjobdobtrans_numunix_timemerch_latmerch_longis_fraud
129666512966652020-06-21 12:08:42213193596103206fraud_Gulgowski LLChome72.17JamesHuntM7369 Gabriel TunnelPointe Aux PinsMI4977545.7549-84.447095Electrical engineer1994-02-09108c103b26f686c24c021aaf4210977e137181652244.938461-83.9962340
129666612966662020-06-21 12:09:224587657402165341815fraud_Hyatt, Russel and Gleichnerhealth_fitness7.30AmberLewisF6296 John Keys Suite 858Pembroke TownshipIL6095841.0646-87.59172135Psychotherapist, child2004-05-0837a18c6fb0c5c722b6339ffedc82f55a137181656240.556811-88.0923390
129666712966672020-06-21 12:10:564822367783500458fraud_Hahn, Douglas and Schowaltertravel19.71ChristopherFarrellM97070 Anderson LandHaines CityFL3384428.0758-81.592933804Exercise physiologist1991-01-0134e72e0a659a6c8f4a20ee65594f3a7d137181665627.465871-81.5118040
129666812966682020-06-21 12:11:23213141712584544fraud_Metz, Russel and Metzkids_pets100.85MargaretCurtisF742 Oneill ShoreFlorenceMS3907332.1530-90.121719685Fine artist1984-12-240d86d8c17638d7eff77db9c6a878b477137181668331.377697-90.5284500
129666912966692020-06-21 12:11:364400011257587661852fraud_Stiedemann Incmisc_pos37.38MarissaPowellF474 Allen HavenNorth LoupNE6885941.4972-98.7858509Nurse, children's1980-09-159a7ea2625cf8303efe34e3c09546868f137181669641.728638-99.0396600
129667012966702020-06-21 12:12:0830263540414123fraud_Reichel Incentertainment15.56ErikPattersonM162 Jessica Row Apt. 072HatchUT8473537.7175-112.4777258Geoscientist1961-11-24440b587732da4dc1a6395aba5fb41669137181672836.841266-111.6907650
129667112966712020-06-21 12:12:196011149206456997fraud_Abernathy and Sonsfood_dining51.70JeffreyWhiteM8617 Holmes Terrace Suite 651TuscaroraMD2179039.2667-77.5101100Production assistant, television1979-12-11278000d2e0d2277d1de2f890067dcc0a137181673938.906881-78.2465280
129667212966722020-06-21 12:12:323514865930894695fraud_Stiedemann Ltdfood_dining105.93ChristopherCastanedaM1632 Cohen Drive Suite 639High Rolls Mountain ParkNM8832532.9396-105.8189899Naval architect1967-08-30483f52fe67fabef353d552c1e662974c137181675233.619513-105.1305290
129667312966732020-06-21 12:13:362720012583106919fraud_Reinger, Weissnat and Strosinfood_dining74.90JosephMurrayM42933 Ryan UnderpassMandersonSD5775643.3526-102.54111126Volunteer coordinator1980-08-18d667cdcbadaaed3da3f4020e83591c83137181681642.788940-103.2411600
129667412966742020-06-21 12:13:374292902571056973207fraud_Langosh, Wintheiser and Hyattfood_dining4.30JeffreySmithM135 Joseph MountainsSulaMT5987145.8433-113.8748218Therapist, horticultural1995-08-168f7c8e4ab7f25875d753b422917c98c9137181681746.565983-114.1861100